Main goals of the session

  1. To understand (1) how genetic variation is estimated and (2) how does random mating configure allele and genotype frequencies

  2. To create R scripts from scratch to solve the previous objectives

1. Practicals organization

In this practical, we are going to use the RStudio integrated development environment (IDE) for R. R is a programming language for statistical computing and graphics.

The current document in which we are working is an RMarkdown document. RMarkdown documents are fully reproducible and allow you to combine text, images and code –this time, R programming language.

To render a R Markdown document to a HTML file, you just need to click the Knit button that you’ll see in the RStudio bar. This HTML file can be shared as a report.

You don’t need to render the whole document every time you want to see the result of your R code. You can click in the Run current chunk button or use the keyboard shortcut Ctrl+Alt+C and the result of the code will appear below it.

You will see different icons through the document, the meaning of which is:

: additional or useful information
: a worked example
: a practical exercise
: a space to answer the exercise
: a hint to solve an exercise
: a more challenging exercise

2. R scripts

Scripts which are going to be created in this session:

3. Quick introduction to R functions

The R scripts that we are going to create will consist on R functions that will automatize the calculation of population genetics statistics, such as the genotype and allele frequencies, or testing HWE.

An R function is created by using the keyword function(). The basic syntax of an R function definition is:

function_name <- function(arg_1,  arg_2, ...) {
  # Function body code
}

The different parts of a function are:

Script 1: Estimation of genotype and allele frequencies from a genotyping survey

We have the following genotype information:

Genotypes MM MN NN
Number of individuals 1787 3037 1305

We are going to create an R script to estimate genotype and allele frequencies. We will create a function that takes the information of the three genotypes and outputs the genotype and allele frequencies.

# ESTIMATION OF GENOTYPE AND ALLELE FREQUENCIES FROM A GENOTYPING SURVEY

# One-gene two-alleles: X1 and X2
# Three genotypes:  X11, X12 and X22
# Sample size genotype ij -> Nij

Estimating the allele frequency

Allele frequency can be calculated from the allele count:

\[p = f(A) = \frac{N_A}{2N}\]

\[q = f(a) = \frac{N_a}{2N}\]

Allele frequency can also be calculated from the genotype count:

\[p = f(A) = \frac{N_{AA}+\frac{1}{2}N_{Aa}}{N} = \frac{1787 + \frac{1}{2}3037}{6129} = 0.539 \]

\[q = f(a) = \frac{N_{aa}+\frac{1}{2}N_{Aa}}{N} = \frac{1305 + \frac{1}{2}3037}{6129} = 0.461 \]

Estimating the genotype frequency

Genotype frequency is estimated from the genotype count:

\[f(AA) = \frac{N_{AA}}{N} = \frac{1787}{6129} = 0.292\]

\[f(Aa) = \frac{N_{Aa}}{N} = \frac{3037}{6129} = 0.496 \]

\[f(aa) = \frac{N_{aa}}{N} = \frac{1305}{6129} = 0.213 \]

Once we have clear the structure of a function, we can write the main body code that will perform the operations to calculate the allele frequencies and the genotype frequencies:

# ESTIMATION OF GENOTYPE AND ALLELE FREQUENCIES FROM A GENOTYPING SURVEY
# One-gene two-alleles: X1 and X2
# Three genotypes:  X11, X12 and X22
# Sample size genotype ij -> Nij

# Create the function
gene_freq <- function(N11, N12, N22) {
   
   N = N11 + N12 + N22 # N total sample
   
   # Allele frequencies estimation
   p = (N11 + N12/2)/N # p is the frequency of X1 
   q = 1 - p # q is the frequency of X2 
   
   # Genotype frequencies estimation
   f11 = N11/N # frequency of X11 genotype 
   f12 = N12/N # frequency of X12 genotype 
   f22 = N22/N # frequency of X22 genotype 
   
   # Output
   return (list("N" = N, "N11" = N11, "N12" = N12, "N22" = N22,
                "p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22))
}

Now, we are going to invoke the function with our genotype data:

# Invoke the function
results = gene_freq(1787, 3037, 1305)

# Print results
print(paste0("Sample size = ",results$N," -> N11 = ",results$N11,", N12 = ", results$N12," and N22 = ",results$N22))
## [1] "Sample size = 6129 -> N11 = 1787, N12 = 3037 and N22 = 1305"
print(paste0("p = ", round(results$p,3)," and q = ",round(results$q,3)))
## [1] "p = 0.539 and q = 0.461"
print(paste0("f11 = ",round(results$f11,3), ", f12 = ",round(results$f12,3)," and f22 = ", round(results$f22,3)))
## [1] "f11 = 0.292, f12 = 0.496 and f22 = 0.213"

Script 2. Random mating in a Mendelian population and Hardy-Weinberg equilibrium (HWE)

Random mating describes an ideal situation in which all individuals of one sex are equally potential partners of all members of the opposite sex. Random mating is also referred as panmixia.

Random mating is one of the requirements for the Hardy–Weinberg law to hold.

Demonstration of the Hardy-Weinberg law for one gene with two alleles

The genotypic frequencies in the initial generation are:

\[P = f(AA)\] \[Q = f(Aa)\] \[R = f(aa)\]

Table with all possible crosses:

Cross Cross frequency AA Aa aa
AA\(\times\)AA P2 1 0 0
AA\(\times\)Aa 2PQ ½ ½ 0
AA\(\times\)aa 2PR 0 1 0
Aa\(\times\)Aa Q2 ¼ ½ ¼
Aa\(\times\)aa 2QR 0 ½ ½
aa\(\times\)aa R2 0 0 1

The next generation genotypic frequencies are:

\[P' = f'(AA)\] \[Q' = f'(Aa)\] \[R' = f'(aa)\]

And can be calculated as:

\[P' = P^2 + \frac{2PQ}{2} + \frac{Q^2}{4} = (P + \frac{Q}{2})^2 = p^2\] \[Q' = PQ + 2PR + \frac{Q^2}{2} + QR = 2pq\]

\[R' = R^2 + \frac{2QR}{2} + \frac{Q^2}{4} = (R + \frac{Q}{2})^2 = q^2\]

We are going to create a function that calculate the mating pairs (frequencies probabilities), the genotype probabilities of the progeny and the allele probabilities of progeny from the initial genotype frequencies (f11, f12 and f22).

# MENDELIAN POPULATION
# RANDOM MATING - HWE

# One-gene two-alleles: X1 and X2
# Three initial genotypes:  X11, X12 and X22
# Initial genotype frequencies: f11, f12, f22

# Create the function
random_mating <- function(f11, f12, f22) {
   
   p = f11 + f12/2 # allele frequency of X1
   q = 1 - p # allele frequency of X2
   
   # Mating pairs (frequencies) probabilities
   f11_11 = f11 * f11
   f11_12 = 2 * f11 * f12
   f11_22 = 2 * f11 * f22
   f12_12 = f12 * f12
   f12_22 = 2 * f12 * f22
   f22_22 = f22 * f22
   # Genotype probabilities of progeny (offspring) f11_, f12_, f22_
   f11_ = f11_11 + f11_12/2 + f12_12/4
   f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
   f22_ = f12_12/4 + f12_22/2 + f22_22
   # Allele probability of progeny p_ and q_
   p_ = f11_ + f12_/2
   q_ = 1 - p_
   # Output
   return (list("p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22,
                "p_" = p_, "q_" = q_,
                "f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}

Now, we are going to invoke the function with the initial genotypes frequencies P = 0.5, Q = 0, R = 0.5:

# Invoke the function
result <- random_mating(0.5, 0, 0.5)

# Print results
print(paste0("Allele prob. initial generation (p and q) ",result$p, result$q))
## [1] "Allele prob. initial generation (p and q) 0.50.5"
print(paste("Genotype prob. initial generation (f11, f12, f22) ",result$f11, result$f12,result$f22))
## [1] "Genotype prob. initial generation (f11, f12, f22)  0.5 0 0.5"
print(paste("Allele prob. next generation (p' and q') ",result$p_, result$q_))
## [1] "Allele prob. next generation (p' and q')  0.5 0.5"
print(paste("Genotype prob. next generation (f'11, f'12, f'22) ",result$f11_, result$f12_,result$f22_))
## [1] "Genotype prob. next generation (f'11, f'12, f'22)  0.25 0.5 0.25"
print(paste("Genotype prob. next generation according HWE ", result$p_ * result$p_, 2*result$p_ * result$q_,result$q_ * result$q_))
## [1] "Genotype prob. next generation according HWE  0.25 0.5 0.25"

Exercise 1 | Genotypic frequencies of two populations

Two populations start with the following genotypic frequencies:

  • Population 1: P1 = f(AA) = 0.24, Q1 = f(Aa) = 0.32, R1 = f(aa) = 0.44
  • Population 2: P2 = f(AA) = 0.33, Q2 = f(Aa) = 0.14, R2 = f(aa) = 0.53

Use the previous script (random_mating) to calculate the genotypic frequencies of the next generation (P’1, Q’1, R’1 and P’2, Q’2, R’2) if random mating exist.

Answer:
# initial genotypic frequencies population 1 
P1 <- 0.24
Q1 <- 0.32
R1 <- 0.44

# initial genotypic frequencies population 2
P2 <- 0.33
Q2 <- 0.14
R2 <- 0.53

next_1 <- random_mating(P1, Q1, R1)
print(paste("Genotype prob. next generation (P’1, Q’1, R’1) ",next_1$f11_, next_1$f12_,next_1$f22_))
## [1] "Genotype prob. next generation (P’1, Q’1, R’1)  0.16 0.48 0.36"
print(paste("Genotype prob. next generation according HWE ", next_1$p_ * next_1$p_, 2*next_1$p_ * next_1$q_,next_1$q_ * next_1$q_))
## [1] "Genotype prob. next generation according HWE  0.16 0.48 0.36"
next_2 <- random_mating(P2, Q2, R2)
print(paste("Genotype prob. next generation (P’2, Q’2, R’2) ",next_2$f11_, next_2$f12_,next_2$f22_))
## [1] "Genotype prob. next generation (P’2, Q’2, R’2)  0.16 0.48 0.36"
print(paste("Genotype prob. next generation according HWE ", next_2$p_ * next_2$p_, 2*next_2$p_ * next_2$q_,next_2$q_ * next_2$q_))
## [1] "Genotype prob. next generation according HWE  0.16 0.48 0.36"

Exercise 2 | Impact of assortative mating on genotype and allele frequencies

Modify the previous script to compute the impact of assortative mating on the genotype and allele frequencies. Consider the following scenarios:

  • Complete assortative positive mating among genotypes (a given genotype only mates with its same genotype)
  • Complete assortative positive mating among phenotypes (a given genotype only mates with its same phenotype)
  • Complete assortative negative mating among genotypes (a given genotype never mates with its same genotype and mate at random with other genotypes)
  • Complete assortative negative mating among phenotypes (a given phenotype never mates with its same phenotype)

Hint: Consider that individuals AA and Aa have the same phenotype, while individuals aa have a different one (A is dominant over a).

Hint: To take into account assortative mating, change the mating pairs probabilities. Here you have solved the first scenario as an example:

# Mating pairs (frequencies) probabilities
ProbAllMatings= f11*f11 + f12*f12 + f22*f22
f11_11 = f11*f11/ProbAllMatings
f11_12 = 0
f11_22 = 0
f12_12 = f12*f12/ProbAllMatings
f12_22 = 0
f22_22 = f22*f22/ProbAllMatings
Answer:
# Complete assortative positive mating among genotypes
a1 <- function(f11, f12, f22) {
   
   p = f11 + f12/2 # allele frequency of X1
   q = 1 - p # allele frequency of X2
   
   # Mating pairs (frequencies) probabilities
   ProbAllMatings= f11*f11 + f12*f12 + f22*f22
   f11_11 = f11*f11/ProbAllMatings
   f11_12 = 0
   f11_22 = 0
   f12_12 = f12*f12/ProbAllMatings
   f12_22 = 0
   f22_22 = f22*f22/ProbAllMatings
   # Genotype probabilities of progeny (offspring) f11_, f12_, f22_
   f11_ = f11_11 + f11_12/2 + f12_12/4
   f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
   f22_ = f12_12/4 + f12_22/2 + f22_22
   # Allele probability of progeny p_ and q_
   p_ = f11_ + f12_/2
   q_ = 1 - p_
   # Output
   return (list("p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22,
                "p_" = p_, "q_" = q_,
                "f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}

#  Complete assortative positive mating among phenotypes
a2 <- function(f11, f12, f22) {
   
   p = f11 + f12/2 # allele frequency of X1
   q = 1 - p # allele frequency of X2
   
   # Mating pairs (frequencies) probabilities
   ProbAllMatings= f11*f11 + 2*f11*f12 + f12*f12 + f22*f22
   f11_11 = f11*f11/ProbAllMatings
   f11_12 = 2*f11*f12/ProbAllMatings
   f11_22 = 0
   f12_12 = f12*f12/ProbAllMatings
   f12_22 = 0
   f22_22 = f22*f22/ProbAllMatings
   # Genotype probabilities of progeny (offspring) f11_, f12_, f22_
   f11_ = f11_11 + f11_12/2 + f12_12/4
   f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
   f22_ = f12_12/4 + f12_22/2 + f22_22
   # Allele probability of progeny p_ and q_
   p_ = f11_ + f12_/2
   q_ = 1 - p_
   # Output
   return (list("p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22,
                "p_" = p_, "q_" = q_,
                "f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}
  
# Complete assortative negative mating among genotypes
a3 <- function(f11, f12, f22) {
   
   p = f11 + f12/2 # allele frequency of X1
   q = 1 - p # allele frequency of X2
   
   # Mating pairs (frequencies) probabilities
   ProbAllMatings= 2*f11*f12 + 2*f11*f22 + 2*f12*f22
   f11_11 = 0
   f11_12 = 2*f11*f12/ProbAllMatings
   f11_22 = 2*f11*f22/ProbAllMatings
   f12_12 = 0
   f12_22 = 2*f12*f22/ProbAllMatings
   f22_22 = 0
   # Genotype probabilities of progeny (offspring) f11_, f12_, f22_
   f11_ = f11_11 + f11_12/2 + f12_12/4
   f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
   f22_ = f12_12/4 + f12_22/2 + f22_22
   # Allele probability of progeny p_ and q_
   p_ = f11_ + f12_/2
   q_ = 1 - p_
   # Output
   return (list("p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22,
                "p_" = p_, "q_" = q_,
                "f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}
  
# Complete assortative negative mating among phenotypes
a4 <- function(f11, f12, f22) {
   
   p = f11 + f12/2 # allele frequency of X1
   q = 1 - p # allele frequency of X2
   
   # Mating pairs (frequencies) probabilities
   ProbAllMatings= 2*f11*f22 + 2*f12*f22
   f11_11 = 0
   f11_12 = 0
   f11_22 = 2*f11*f22/ProbAllMatings
   f12_12 = 0
   f12_22 = 2*f12*f22/ProbAllMatings
   f22_22 = 0
   # Genotype probabilities of progeny (offspring) f11_, f12_, f22_
   f11_ = f11_11 + f11_12/2 + f12_12/4
   f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
   f22_ = f12_12/4 + f12_22/2 + f22_22
   # Allele probability of progeny p_ and q_
   p_ = f11_ + f12_/2
   q_ = 1 - p_
   # Output
   return (list("p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22,
                "p_" = p_, "q_" = q_,
                "f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
} 

Script 3: HWE χ2 test on counts from a genotyping survey

Deviations from Hardy-Weinberg equilibrium for a genetic variant can be calculated using the χ2 test, using the observed genotype frequencies obtained from the empirical data and the expected genotype frequencies obtained using the Hardy-Weinberg proportions (HWP). The null hypothesis is that the population is in HWP, and the alternative hypothesis is that the population is not in HWP.

Genotype MM MN NN Total
Number of individuals 1787 3037 1305 6129
Number of alleles M 3574 3037 0 6611
Number of alleles N 0 3037 2610 5647
Number of alleles M+N 3574 6074 2610 12258

First, we calculate the allele frequency of both alleles:

\[p = \frac{N_M}{2N} = \frac{6611}{12258} = 0.539\] \[p = \frac{N_N}{2N} = \frac{5647}{12258} = 0.461\]

Then the HWE expected frequency is estimated as:

\[E(AA) = p^2 \times N = 0.539^2 \times 6129 = 1782.7\]

\[E(Aa) = 2pq \times N = 2 \times 0.539 \times 0.461 \times 6129 = 33045.6\] \[E(aa) = q^2 \times N = 0.461^2 \times 6129 = 1300.7\]

The χ2 test formula is:

\[ \chi^2 = \sum{\frac{(Expected \; number - Observed \; number)^2}{Expected \; number}} = \] \[ \frac{(1782.7 – 1787)^2}{1782.7} + \frac{(33045.6 – 3037)^2}{33045} + \frac{(1300.7 – 1305)^2}{1300} = 0.048\]

We have to compare this value with the χ2 distribution to see if its significant. The probability of exceeding the critical χ2 with a 0.05 significance and 1 degree of freedom is χ20.05;1= 3.841. Because the calculated χ2 value is lower than the critical value, the observed genotypes are in HWE.

Performing a χ2 in R

n R, the critical value can be obtained with the following function:

qchisq(0.95, df = 1)
## [1] 3.841459

The probability of your value (Chi_value = 0.048) can be obtained with:

pchisq(0.048, df = 1, lower = F)
## [1] 0.8265807
# HWE CHI-SQUARE TEST ON COUNTS FROM A GENOTYPING SURVEY 

# One-gene two-alleles: X1 and X2
# Three initial genotypes:  X11, X12 and X22
# Sample size genotype ij -> Nij 

# Create the function
HWE_test <- function(N11, N12, N22) {

   N = N11 + N12 + N22 # N total sample
   
   # Allele frequencies estimation
   p = (N11 + N12/2)/N # p is the frequency of X1 
   q = 1 - p # q is the frequency of X2 
   
   #Genotype frequencies
   f11 = N11/N # frequency of X11 genotype
   f12 = N12/N # frequency of X12 genotype
   f22 = N22/N # frequency of X22 genotype

   # Expected HWE
   E11 = N*p^2 
   E12 = N*2*p*q
   E22 = N-E11-E12
   
   # Chi-square value and probability
   Chi_value = (N11-E11)^2/E11 + (N12-E12)^2/E12 + (N22-E22)^2/E22
   prob = pchisq(Chi_value, df = 1, lower = F)

   # Output
   return (list("N" = N, "N11" = N11, "N12" = N12, "N22" = N22,
                "p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22,
                "Chi_value" = Chi_value, "prob" = prob))
}

Now, we are going to invoke the function with our genotype data:

# Invoke the function
results <- HWE_test(1787,3037,1305)

# Print results
print(paste("Sample size = ", results$N," -> N11 = ", results$N11,", N12 = ", results$N12,",N22 = ", results$N22))
## [1] "Sample size =  6129  -> N11 =  1787 , N12 =  3037 ,N22 =  1305"
print(paste("p = ",round(results$p,digits=3)," and q = ", round(results$q,digits=3) ))
## [1] "p =  0.539  and q =  0.461"
print(paste("f11 = ",round(results$f11,digits=3),", f12 = ",round(results$f12,digits=3)," and f22 = ", round(results$f22,digits=3)))
## [1] "f11 =  0.292 , f12 =  0.496  and f22 =  0.213"
print(paste("Chi-square value = ",round(results$Chi_value,3),"; Prob. = ",  round(results$prob,3)))
## [1] "Chi-square value =  0.048 ; Prob. =  0.826"

Exercise 3 | Analysis of a Navajo population

361 Navajo individuals were analyzed in New Mexico for the MN locus: 305 MM types, 52 MN and 4 NN. Use the previous script to answer the next questions:

  • Test for the deviation from HWE and its significance.
  • What proportion of children of women of phenotype N are expected to present the maternal phenotype?
  • What proportion of children of heterozygous MN women are expected to have the maternal phenotype?

Note: M and N are codominant.

Answer:
navajo <- HWE_test(305, 52, 4)
# Test for the deviation from HWE and its significance.
# We check if chi square value is grater or smaller than critical value
critic_value = qchisq(0.95, df = 1)
chisq_value = round(navajo$Chi_value,3)
# if chi square value is smaller we know that exist significant deviance from Hardy-Weinberg Equilibrium
if (chisq_value > critic_value){
  print("There is a significant deviance from Hardy-Weinberg Equilibrium.") # reject H0
} else {
  print("There is no significant deviance from Hardy-Weinberg Equilibrium.") # accept H0
}
## [1] "There is no significant deviance from Hardy-Weinberg Equilibrium."
# What proportion of children of women of phenotype N are expected to present the maternal phenotype?
print(paste("Proportion of children of women with phenotype N expected to have maternal phenotype (NN) = ", round(navajo$f22, 3)))
## [1] "Proportion of children of women with phenotype N expected to have maternal phenotype (NN) =  0.011"
# Proportion of children of heterozygous MN women expected to have maternal phenotype (NN or MN)
print(paste("Proportion of children of heterozygous MN women expected to have the maternal phenotype = ", round(navajo$f12 + navajo$f22, 3)))
## [1] "Proportion of children of heterozygous MN women expected to have the maternal phenotype =  0.155"

Exercise 4 |Analysis of three alleles

Three alleles A, B, anc C of the red cell acid phosphatase locus were found in a simple of 178 English individuals. The number of individuals found with six possible genotypes are given below.

Genotype AA AB AC BB BC CC
Number 17 86 5 61 9 0

Modify the previous script to do the following:

  • Estimate the frequencies of alleles A, B, and C
  • Use a χ2 test to examine whether these data are consistent with HWE

Hint: Modify the code to take into account three alleles (and six genotypes) instead of two alleles (and three genotypes)

# One-gene  3-alleles: X1, X2, X3
# Genotypes:  X11, X12, X13, X22, X23, X33
# Sample size genotype ij -> Nij
# The input data are the counts of each genotype ordered
Answer:
HWE_test_3 <- function(N11, N12, N13, N22, N23, N33) {

   N = N11 + N12 + N13 + N22 + N23 + N33 # N will be the sample size
   
   # Allele frequencies estimation
   p = (N11 + N12/2 + N13/2)/N # frequency of allele N1 
   q = (N22 + N12/2 + N23/2)/N # frequency of allele N2
   r = 1-p-q # frequency of allele N3
  
   # We can divide to know the enotype frequencies
   f11 = N11/N
   f12 = N12/N
   f13 = N13/N
   f22 = N22/N
   f23 = N23/N 
   f33 = N33/N 

   # Expected HWE
   E11 = N*p^2 
   E12 = N*2*p*q
   E13 = N*2*p*r
   E22 = N*q^2
   E23 = N*2*q*r
   E33 = N*r^2
   
   # Chi-square value and probability
   Chi_val = (N11-E11)^2/E11 + (N12-E12)^2/E12 + (N13-E13)^2/E13 + (N22-E22)^2/E22 + (N23-E23)^2/E23 + (N33-E33)^2/E33
   prob = pchisq(Chi_val, df = 2, lower = F)

   # Output
   return (list("N" = N, "N11" = N11, "N12" = N12, "N13" = N13, "N22" = N22, "N23" = N23, "N33" = N33,
                "p" = p, "q" = q, "r" = r,
                "f11" = f11, "f12" = f12, "f13" = f13, "f22" = f22, "f23" = f23, "f33" = f33,
                "Chi_value" = Chi_val, "prob" = prob))
}


# Estimate the frequencies of alleles A, B, and C
hwe <- HWE_test_3(17, 86, 5, 61, 9, 0)

print(paste("Frequency of A:", round(hwe$p, 3)))
## [1] "Frequency of A: 0.351"
print(paste("Frequency of B:", round(hwe$q, 3)))
## [1] "Frequency of B: 0.61"
print(paste("Frequency of C:", round(hwe$r, 3)))
## [1] "Frequency of C: 0.039"
# Use a χ2 test to examine whether these data are consistent with HWE
critical_value = qchisq(0.95, df = 2)
chisq_value = round(hwe$Chi_value, 3)

if (chisq_value > critical_value){
  print("The data  not follow with Hardy-Weinberg Equilibrium.")
} else {
  print("The data follow Hardy-Weinberg Equilibrium.")
}
## [1] "The data follow Hardy-Weinberg Equilibrium."

Script 4. Estimation of allele and genotype frequencies from counts of dominant and recessive phenotypes assuming HWE

Consider an allele A which is totally dominate over a. That means that the dominant phenotype will show if either the homozygous AA or heterozygous Aa genotypes occur. This type of inheritance occurs with the Rh human blood types, for example. There are two alleles: D and d. Individuals who are homozygous dominant (DD) or heterozygous (Dd) are Rh+, and therefore, have Rh antigens). Those who are homozygous recessive (dd) are Rh- (they don’t have Rh antigens).

To calculate genotype frequencies from counts of dominant (or recessive) phenotypes, we have to consider the following. The recessive phenotype is controlled by the homozygous aa genotype, while the frequency of the dominant phenotype equals the sum of the frequencies of AA and Aa. Therefore, the recessive phenotype is simply the frequency of aa:

Phenotype Genotype Number of individuals
Rh+ AA or Aa 170
Rh- aa 30

From the phenotypic frequencies we know that:

\[f(AA) + f(Aa) = \frac{N{A\_}}{N} = \frac{170}{200} = 0.8 \]

\[ f(aa) = \frac{N{aa}}{N} = \frac{30}{200} = 0.15 \]

We use the frequency of the recessive genotype to estimate q:

\[ q = \sqrt{f(aa)} = \sqrt{0.15} = 0.387 \] \[ p = 1 - q = 0.613 \]

Then, the genotype frequencies can be estimated according to the HWE:

\[ f(AA) = p^2 = 0.613^2 = 0.375 \] \[ f(Aa) = 2pq = 2 \times 0.613 \times 0.387 = 0.475 \]

\[ f(aa) = q^2 = 0.387^2 = 0.15 \]

Some examples of Mendelian recessive phenotypes in humans:

# ESTIMATION OF ALLELE AND GENOTYPE FREQUENCIES FROM COUNTS OF DOMINANT AND
# RECESSIVE PHENOTYPES ASSUMING HWE
# One-gene two-alleles with dominance: X1 > X2
# Three genotypes:  X11, X12 and X22
# Two phenotypes: N1_ and N22
# Sample size genotype ij -> Nij

# Create the function
freq_dominance <- function(NA_, Naa) {
    
    N = NA_ + Naa # N total sample
    
    # Phenotypic frequencies
    faa = Naa/N # Frequency genotype aa
    
    # Allele frequency
    q = sqrt(faa)
    p = 1 - q;
    
    # Genotype frequencies according HWE
    f11 = p^2 
    f12 = 2*p*q
    f22 = q^2
    
    # Output
   return (list("N" = N, "NA_" = NA_, "Naa" = Naa, 
                "p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22))
}

Now, we are going to invoke the function with our genotype data:

# Invoke the function
results <- freq_dominance(170, 30)

# Print results
print(paste("Sample size = ", results$N," -> NA_ = ",results$NA_,",Naa = ", results$Naa))
## [1] "Sample size =  200  -> NA_ =  170 ,Naa =  30"
print(paste("p = ",round(results$p,digits=3)," and q = ", round(results$q,digits=3) ))
## [1] "p =  0.613  and q =  0.387"
print(paste("fAA = ",round(results$f11,digits=3),", fAa = ",
round(results$f12,digits=3)," and faa = ", round(results$f22,digits=3) ))
## [1] "fAA =  0.375 , fAa =  0.475  and faa =  0.15"

Exercise 5 | Solve with the previous script

What is the frequency of heterozygotes Aa in a population with random mating in which the frequency of the dominant phenotype is 0.19?

Answer:
results <- freq_dominance(0.19 * 200, 0.81 * 200)

# Print the frequency of heterozygotes Aa
print(paste("Frequency of heterozygotes Aa: ", round(results$f12, digits = 3)))
## [1] "Frequency of heterozygotes Aa:  0.18"

Exercise 6 | Solve with the previous script

What is the frequency of heterozygotes Aa in a population with random mating if the frequency of the recessive phenotype aa is 0.09?

Answer:
results <- freq_dominance(0.91 * 200, 0.09 * 200)

# Print the frequency of heterozygotes Aa
print(paste("Frequency of heterozygotes Aa: ", round(results$f12, digits = 3)))
## [1] "Frequency of heterozygotes Aa:  0.42"

Exercise 7 | Frequency of cystic fibrosis

1 in 1700 US Caucasian newborns have cystic fibrosis. C is the normal allele, dominant over the recessive c. Individuals must be homozygous for the recessive allele to have the disease. Calculate the following, using the previous script:

  • The percent of the above population have cystic fibrosis.
  • The frequency of the recessive allele in the population.
  • The frequency of the dominant allele in the population.
  • The percentage of heterozygous individuals (carriers) in the population
Answer:
freq_cc <- 1/1700 # 1 in 1700 US Caucasian newborns have cystic fibrosis.
result <- freq_dominance(1-freq_cc, freq_cc)


print(paste("Percent of population with cystic fibrosis:", round(freq_cc*100, 3), "%"))
## [1] "Percent of population with cystic fibrosis: 0.059 %"
print(paste("Frequency of recessive allele in the population:", round(result$q, 5)))
## [1] "Frequency of recessive allele in the population: 0.02425"
print(paste("Frequency of dominant allele in the population:", round(result$p, 5)))
## [1] "Frequency of dominant allele in the population: 0.97575"
print(paste("Percentage of carriers (heterozygous individuals) in the population:", round(result$f12*100, 3), "%"))
## [1] "Percentage of carriers (heterozygous individuals) in the population: 4.733 %"

Exercise 8 | Albino frequency

In the human population, one in every 10,000 babies is albino. Using the previous script, calculate the approximate frequency of the allele produced by albinism, knowing that it is autosomal and recessive. What proportion of non-albino individuals are carriers?

Answer:
# frequency of albino phenotype (aa)
freq_albino <- 1/10000

result <- freq_dominance(1-freq_albino, freq_albino)

# The frequency of the recessive allele in the population.
print(paste("Frequency of recessive allele:", round(result$q, 5)))
## [1] "Frequency of recessive allele: 0.01"
# The percentage of heterozygous individuals (carriers) in the population
print(paste("Percentage of peopple non-albino individuals that are carriers:", round(result$f12*100, 3), "%"))
## [1] "Percentage of peopple non-albino individuals that are carriers: 1.98 %"

Final problems

Problem 1 | Impact of deviation of Mendelian segregation 1:1 on HWE

The HWE is a characteristic of a random mating population, if no external evolutionary forces are acting on it (for example, natural selection, migration or mutation). We are going to study what happens when the Mendelian law of equal segregation is not met.

Consider that the heterozygote “Aa” segregates in proportion 2(A):1(a), instead of an equal proportion (1)A:(1)a.

Cross Cross frequency AA Aa aa
AA × AA
AA × Aa 2PQ
AA × aa 2PR
Aa × Aa
Aa × aa 2QR
aa × aa

\[ P' = ... \] \[ Q' = ... \] \[ R' = ... \]

Modify the random_mating() function with these new probabilities and answer these questions:

for (i in 1:25) {

}

To better understand whats happening with the allele frequencies, we are going to plot how the allele frequency of A changes over generations. To do that, we are going to create an empty vector and we will store the values of p to generate the plot.

p <- vector()

for (i in 1:25) {
  # call the function
  result <- random_mating ...
  p[i] <- result$p
}
Answer:
random_mating <- function(f11, f12, f22) {
  p = f11 + f12/2 # allele frequency of X1
  q = 1 - p # allele frequency of X2

  # Mating pairs (frequencies) probabilities
  f11_11 = f11*f11
  f11_12 = 2*f11*f12
  f11_22 = 2*f11*f22
  f12_12 = f12*f12
  f12_22 = 2*f12*f22
  f22_22 = f22*f22
  # Genotype probabilities of progeny (offspring) f11_, f12_, f22_
  f11_ = f11_11 + f11_12/2 + f12_12/4
  f12_ = f11_12/2 + f11_22 + f12_12/2 + f12_22/2
  f22_ = f12_12/4 + f12_22/2 + f22_22
  # Allele probability of progeny p_ and q_
  p_ = f11_ + f12_/2
  q_ = 1 - p_
  # Output
  return (list("p" = p, "q" = q,
               "f11" = f11, "f12" = f12, "f22" = f22,
               "p_" = p_, "q_" = q_,
               "f11_" = f11_, "f12_" = f12_, "f22_" = f22_))
}

p <- vector()
generations <- 1:25
f11 <- 0.5
f12 <- 0
f22 <- 0.5

res <- random_mating(f11, f12, f22)
print(paste("Allele prob. next generation (p' and q') ", res$p_, res$q_))
## [1] "Allele prob. next generation (p' and q')  0.5 0.5"
print(paste("Genotype prob. next generation (f'11, f'12, f'22) ",res$f11_, res$f12_,res$f22_))
## [1] "Genotype prob. next generation (f'11, f'12, f'22)  0.25 0.5 0.25"
for (i in 1:25) {
  result <- random_mating(f11, f12, f22)
  f11 <- result$f11_
  f12 <- result$f12_
  f22 <- result$f22_
  p[i] <- result$p
}

plot(generations, p, type = "l", xlab = "Generations", ylab = "Allele Frequency", main = "Change in Allele Frequency over time")

Problem 2 | Random mating and HWE for a X-linked gene

Modify the function random_mating() to compute the genotype and allele frequencies for the case of a X-linked gene.

Hint: To consider an X-linked loci, you have to take into account that males are hemizygous for the locus, so simply count males as having only one allele for each frequency calculation.

P = f(AA)       R = f(A)
H = f(Aa)       S = f(a)
Q = f(aa)
  • Study the evolutionary dynamics: suppose that we have the extreme initial case pm(0) = 1 and pf(0) = 0. Plot how pm and pf changes over 25 generations. What pattern do you observe?

  • What equilibrium is eventually reached?

Answer:
# Define the random mating function with updated crossing probabilities
random_mating <- function(f11, f12, f22, f1, f2) {
   
   p = f11 + f12/2 + f1/2  # allele frequency of X1
   q = 1 - p  # allele frequency of X2
   
   # Mating pairs (frequencies) probabilities
   all_prob = (f1 * f11) + (2 * f1 * f12) + (f1 * f22) + (2* f2 * f12) + (f2 * f22) + (f2 * f11)
   f1_11 = f1 * f11 / all_prob
   f1_12 = 2 * f1 * f12 / all_prob
   f1_22 = f1 * f22 / all_prob
   f2_12 = 2* f2 * f12 / all_prob
   f2_22 = f2 * f22 / all_prob
   f2_11 = f2 * f11 / all_prob
   
   # Genotype probabilities of progeny (offspring) f11_, f12_, f22_
   f11_ = f1_11/2 + f1_12/4
   f12_ = f1_12/4 + f1_22/2 + f2_12/4 + f2_11/2
   f22_ = f2_12/4 + f2_22/2 
   f1_ = f1_12/4 + f1_11/2 + f2_12/4 + f2_11/2
   f2_ = f1_12/4 + f1_22/2 + f2_12/4 + f2_22/2
   
   # Allele probability of progeny p_ and q_
   p_ = f11_ + f12_/2 + f1_/2
   q_ = 1 - p_
   
   # Output
   return (list("p" = p, "q" = q,
                "f11" = f11, "f12" = f12, "f22" = f22,
                "p_" = p_, "q_" = q_,
                "f11_" = f11_, "f12_" = f12_, "f22_" = f22_, "f1_" = f1_, "f2_" = f2_))
}

# Initial frequencies
f11 <- 0
f12 <- 0
f22 <- 1
f1 <- 1
f2 <- 0

# Empty vectors to store allele frequencies over generations
p <- vector()
q <- vector()

# Calculate allele frequencies over 25 generations
for (i in 1:25) {
  result_final2 <- random_mating(f11, f12, f22, f1, f2)
  f11 <- result_final2$f11_
  f12 <- result_final2$f12_
  f22 <- result_final2$f22_
  f1 <- result_final2$f1_
  f2 <- result_final2$f2_
  
  p[i] <- result_final2$p
  q[i] <- result_final2$q
}

plot(generations, p, type = "l", col = "yellow", xlab = "Generation", ylab = "Allele Frequency", main = "Change in Allele Frequency of A (p) over Generations")

plot(generations, q, type = "l", col = "gray", xlab = "Generation", ylab = "Allele Frequency", main = "Change in Allele Frequency of a (q) over Generations")


Deliver the RMarkdown in Aul@-ESCI

Deliver this document in Aul@-ESCI with your answers

Deadline: 14 April 2024

Doubts? and .